Implications of Stemness Features in 1059 Hepatocellular Carcinoma Patients from Five Cohorts: Prognosis, Treatment Response, and Identification of Potential Compounds

Simple Summary Cancer stemness has been reported to drive hepatocellular carcinoma (HCC) tumorigenesis and treatment resistance. However, comprehensive interpretations of transcriptomic stemness features in HCC patients have not been conducted in multiple cohorts. Our aim was to interpret clinical and therapeutic implications of transcriptional stemness features and explore potential compounds for HCC treatment. We found that transcriptional stemness indexes (mRNAsi) were independently associated with worse HCC prognosis. The HCC stemness risk model (HSRM) developed in this study significantly predicted prognosis and treatment response in various HCC cohorts. Analysis of two stemness subtypes suggested several liver-specific metabolic pathways, and mutations of TP53 and RB1 were associated with HCC transcriptional stemness. Moreover, we also identified potential compounds that target HCC transcriptional stemness. Our findings comprehensively characterized transcriptional stemness as a risk factor in HCC progression and treatment. Abstract Cancer stemness has been reported to drive hepatocellular carcinoma (HCC) tumorigenesis and treatment resistance. In this study, five HCC cohorts with 1059 patients were collected to calculate transcriptional stemness indexes (mRNAsi) by the one-class logistic regression machine learning algorithm. In the TCGA-LIHC cohort, we found mRNAsi was an independent prognostic factor, and 626 mRNAsi-related genes were identified by Spearman correlation analysis. The HCC stemness risk model (HSRM) was trained in the TCGA-LIHC cohort and significantly discriminated overall survival in four independent cohorts. HSRM was also significantly associated with transarterial chemoembolization treatment response and rapid tumor growth in HCC patients. Consensus clustering was conducted based on mRNAsi-related genes to divide 1059 patients into two stemness subtypes. On gene set variation analysis, samples of subtype I were found enriched with pathways such as DNA replication and cell cycle, while several liver-specific metabolic pathways were inhibited in these samples. Somatic mutation analysis revealed more frequent mutations of TP53 and RB1 in the subtype I samples. In silico analysis suggested topoisomerase, cyclin-dependent kinase, and histone deacetylase as potential targets to inhibit HCC stemness. In vitro assay showed two predicted compounds, Aminopurvalanol-a and NCH-51, effectively suppressed oncosphere formation and impaired viability of HCC cell lines, which may shed new light on HCC treatment.


Introduction
Liver cancer is one of the most common cancers worldwide [1]. Globally, an estimated 906,000 new cases of liver cancer and 830,000 deaths attributable to liver cancer were reported in 2020 [1]. Liver cancer is considered one of the most lethal cancers, with a 5 year survival rate of just 18% [2]. Hepatocellular carcinoma (HCC) accounts for a vast majority of all cases of liver cancer (75-85%) [1]. Tumor resection and ablation are the main strategies for treatment of HCC patients at an early stage (BCLC stage 0 or A); however, ideal candidates for these strategies are relatively limited [3,4]. Moreover, owing to the insidious onset of HCC, the majority of patients are diagnosed at an advanced stage [3,4]. For patients with intermediate-stage tumors (BCLC stage B), transarterial therapy, especially transarterial chemoembolization (TACE), is considered as a standard treatment option. However, a systematic review showed that only half of HCC patients (52.5%) show objective response with TACE [5]. In addition, systemic therapies are recommended for patients with intermediate-and advanced-stage tumors [4]. Despite substantial improvement in the efficacy of HCC systemic therapies, most HCC patients have a dismal prognosis [6][7][8][9]. Therefore, further research on tumorigenesis and treatment of HCC is a key imperative.
Stemness refers to the self-renewal and differentiation potential of normal stem cells [10,11]. Previous studies have reported a subset of cells, known as cancer stem cells (CSCs), that acquire stem-cell-like characteristics and play an important role in the tumorigenesis and progression of various cancers [12,13]. CSCs are more likely to drive cancer metastasis, recurrence, and therapeutic resistance than other tumor cells [14]. In the context of HCC, such CSCs may also equip tumors with various characteristics to fit in the tumor microenvironment and acquire chemoresistance, though the CSCs model in HCC has been heavily debated [15][16][17][18]. The dedifferentiated cells in HCC may also arise from adult hepatocytes, which could dedifferentiate into progenitor-like precursor cells that ultimately transform into HCC [17]. These findings implicated that the oncogenic dedifferentiation of HCC may derive from various sources. In addition, many studies have also shown that cancer stemness of HCC is associated with dysregulation of pathways such as hypoxia, Wnt, Notch, and NF-κB signaling pathways [19][20][21][22]. This suggests that cancer stemness of HCC may be driven by a transcriptional network that comprises multiple altered molecular pathways.
The one-class logistic regression (OCLR) machine learning algorithm was proved as a robust method to quantify the cancer stemness by two different indices that used the pluripotent stem cell samples of a PCBC data set to train stemness signatures based on mRNA expression or DNA methylation and, subsequently, applied them to score tumor samples [23,24]. The mRNA expression-based stemness index (mRNAsi) is reflective of gene expression profile and mDNAsi was reflective of epigenetic characteristics [23]. Several studies have validated the potential of applying mRNAsi in various tumors by integrating multiple data sets to explore the clinical and biological implications of the transcriptomic stemness feature [25][26][27]. Therefore, systematic assessment of such a transcriptomic stemness index in multiple cohorts may facilitate in-depth characterization of HCC. Moreover, pharmacogenomic data sets and multiple liver cancer cell line models have also been used to screen novel compounds for HCC treatment [28]. Meanwhile, cancer stemness has been reported to induce resistance of HCC to different drugs [29][30][31]. Considering the above facts, it is worthy of screening compounds targeting HCC cancer stemness by leveraging the HCC stemness index and public pharmacogenomic model.
In the current study, we included data of 1059 HCC patients belonging to five independent cohorts to calculate mRNAsi of HCC samples using the OCLR machine learning algorithm. The main findings of this research were summarized in Table S1. Briefly, we developed an HCC stemness risk model (HSRM) to discriminate overall survival (OS) in the TCGA-LIHC cohort and validated its efficacy in four independent HCC cohorts. The model was also found to be a potential indicator of rapid tumor growth and TACE treatment response of HCC patients. Based on mRNAsi-related genes, we then conducted consensus clustering to divide these HCC patients into two stemness subtypes with distinct functional annotations, genomic variations, and clinical features. Finally, Connectivity Map (CMap) and the Liver Cancer Model Repository (LIMORE) were used to screen anti-stemness

Calculation of the Transcriptional Stemness Index (mRNAsi)
The transcriptomic data of human 229 cell samples from the Progenitor Cell Biology Consortium (PCBC) (https://www.synapse.org/, accessed on 26 May 2021) were downloaded by the R package "synapser". The OCLR algorithm was used to calculate the mRNAsi, a previously reported machine learning algorithm, based on transcriptomic data of human stem (78 samples) and non-stem cells (151 samples) [23,24]. The R package "gelnet" was used to train the stemness signatures, and it was applied to score HCC samples from five cohorts.

Classification of HCC Patients Based on mRNAsi-Related Genes
To identify mRNAsi-related genes in the TCGA-LIHC cohort, correlations between gene expression levels and mRNAsi were assessed using Spearman correlation coefficients; |R| > 0.5 was considered as the threshold for determining mRNAsi-related genes.
By using the R package "ConsensusClusterPlus", unsupervised consensus clustering was conducted to classify 1059 HCC patients from five HCC cohorts based on the expression of mRNAsi-related genes [38]. By sampling 80% of the data in each iteration, the clustering was performed with 1000 iterations. The consensus heatmap, the proportion of ambiguous clustering (PAC) algorithm, and area under the cumulative distribution function (CDF) curves were used to comprehensively identify the optimal number of categories [39].
To identify differentially enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways of the stemness subtypes, gene set variation analysis (GSVA) was performed by the "GSVA" R package [40]. The enrichment scores of KEGG pathways were compared between the two stemness subtypes using the R package "limma", and a Benjamini-Hochberg false discovery rate (FDR) < 0.05 and |log2FC| > 0.35 were used as cut off values to identify the most differentially enriched KEGG pathways [41].
The somatic mutation data from the LIHC-TCGA (MAF format, 353 patients) and LIRI-JP (simple somatic mutation format, 229 patients) cohorts were downloaded to compare the somatic mutations between the two stemness subtypes. The frequencies and types of mutations were analyzed and visualized by the R package "Maftools" [42].

Construction and Validation of the HSRM
Transcriptome data and OS information from the TCGA-LIHC cohort were used to construct the HSRM. To identify prognostic hub genes among mRNAsi-related genes, least absolute shrinkage and selection operator (LASSO) regression analysis was performed using the R package "glmnet" [43]. Then, HSRM was constructed by multivariate Cox regression analysis based on these prognostic hub genes. The LIRI-JP, CHCC-HBV, GSE14520, and GSE54236 cohorts were used to test the predictive efficiency of HSRM by Kaplan-Meier plots and time-dependent receiver operating characteristic (ROC) curve analysis.
Transcriptome data and treatment response information from the GSE104580 cohort were used to test the efficacy of HSRM in discriminating responders and non-responders to TACE treatment. Transcriptome data and calculated tumor volume doubling time of the GSE54236 cohort were used to explore the association between HSRM and rapid tumor growth in HCC patients.

CMap Analysis
The CMap database (https://clue.io/, accessed on 1 July 2021) was used to screen potential compounds targeting the molecular pathways and genes associated with the stemness features of HCC patients. A list of differentially expressed genes (DEGs) were used to query the CMap database. Based on the median value of the mRNAsi, 1059 HCC patients from five cohorts were divided into high and low mRNAsi groups. By using the R package "limma", DEGs between the high and low mRNAsi groups were selected with an FDR < 0.05 and |fold change (FC)| > 1 as cut off values. For functional annotation of the DEGs, enrichment analysis was conducted to identify significant KEGG pathways using the R package "clusterprofiler". Gene set enrichment analysis (GSEA) was conducted to identify significant differentially enriched KEGG pathways between the high and low mRNAsi groups using the R package "clusterprofiler". The CMap connectivity score is a standardized measure ranging from −100 to 100. In the present study, compounds with a connectivity score ≤ −95 were identified as potential compounds and sorted by shared mechanisms of action (MOAs).

Cell Lines
The HCC cell lines, Huh7 and Hep3B, were obtained from the Cell Bank of the Chinese Academy of Sciences (Shanghai, China). All cell lines were maintained in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum (FBS) at 37 • C in an incubator with 5% CO 2 .

Statistical Analysis
All statistical analyses were performed using the R statistical software (version 4.0). Between-group differences were assessed using the Wilcoxon test. Multi-group comparisons were performed using the Kruskal-Wallis test. Spearman's correlation test was used to assess correlation between two continuous variables. Contingency table variables were analyzed by Chi-squared test or Fisher's exact test. Survival analysis was performed by the Kaplan-Meier method using the R package "survival". Time-dependent receiver ROC curves analysis was performed by the R package "timeROC". Cox regression models (univariate or multivariate) were established using the R package "survival". To determine independent prognostic factors for overall survival (OS), Cox regression methods were used and factors with a p < 0.2 in univariate analyses were kept for multivariate analyses. The median value was used as the cut off to classify HCC patients into a high mRNAsi group and a low mRNAsi group in Kaplan-Meier survival analysis and Cox regression analysis. The median value was used as the cut off to classify HCC patients into a high HSRM group and a low HSRM group in Kaplan-Meier survival analysis. Differential analysis was conducted via the R package "limma". The accuracy of predictive models was evaluated by the area under the curve (AUC). p-Values < 0.05 were considered indicative of statistical significance.

Association of Transcriptional Stemness Index with Clinical and Molecular Features
As shown in Figure 1, five HCC cohorts with complete OS information and transcriptomic data were collected for systematic analysis of stemness ( Table 1). The mRNAsi of HCC samples was calculated using the OCLR machine learning algorithm (Supplementary  Materials Table S2).   We first investigated the association of mRNAsi with various clinical features in the TCGA-LIHC cohort ( Figure 2A). We observed no significant association of mRNAsi with age or sex of the HCC patients ( Figure 2B). High mRNAsi was associated with advanced histological grade and TNM stage. HCC patients with a high level of alpha fetal protein (AFP; biomarker of HCC) had a significantly higher mRNAsi level ( Figure 2B). In addition, primary tumor of HCC patients showed significantly higher mRNAsi compared with paratumoral normal tissues ( Figure 2B). We then explored the prognostic relevance of mRNAsi in HCC patients using Kaplan-Meier analysis. HCC patients with higher mRNAsi scores showed significantly worse OS and progression-free survival (PFS) ( Figure 2C,D).
We first investigated the association of mRNAsi with various clinical features in the TCGA-LIHC cohort ( Figure 2A). We observed no significant association of mRNAsi with age or sex of the HCC patients ( Figure 2B). High mRNAsi was associated with advanced histological grade and TNM stage. HCC patients with a high level of alpha fetal protein (AFP; biomarker of HCC) had a significantly higher mRNAsi level ( Figure 2B). In addition, primary tumor of HCC patients showed significantly higher mRNAsi compared with paratumoral normal tissues ( Figure 2B). We then explored the prognostic relevance of mRNAsi in HCC patients using Kaplan-Meier analysis. HCC patients with higher mRNAsi scores showed significantly worse OS and progression-free survival (PFS) (Figure 2C,D).
As a transcriptional index, mRNAsi is reflective of gene expressions associated with HCC stemness [23]. We thus assessed the correlation between mRNAsi and gene expression levels, among which CCNB1 was the most mRNAsi-associated gene ( Figure 2E). In addition, expression levels of several well-characterized cancer stemness regulators, such as FOXM1 and EZH2, also showed a strong correlation with mRNAsi among the HCC patients of the TCGA-LIHC cohort ( Figure 2E) [44,45]. As a transcriptional index, mRNAsi is reflective of gene expressions associated with HCC stemness [23]. We thus assessed the correlation between mRNAsi and gene expression levels, among which CCNB1 was the most mRNAsi-associated gene ( Figure 2E). In addition, expression levels of several well-characterized cancer stemness regulators, such as FOXM1 and EZH2, also showed a strong correlation with mRNAsi among the HCC patients of the TCGA-LIHC cohort ( Figure 2E) [44,45].

Training and Validation of HSRM of HCC Stemness in Five HCC Cohorts
In the TCGA-LIHC cohort, we found mRNAsi might be a risk factor of shorter OS in HCC patients ( Figure 2C). We then evaluated prognostic factors for OS of HCC patients in the TCGA-LIHC cohort by Cox regression methods. In the univariate analysis, HCC OS was statistically associated with the pathological stage, tumor stage, and mRNAsi ( Table 2, p < 0.05). Multivariate Cox regression analysis showed that mRNAsi was significantly associated with HCC OS ( Table 2, p < 0.05). These results suggested that mRNAsi was independently associated with OS in HCC patients. Then, the TCGA-LIHC cohort was used to train the HSRM, which was subsequently validated in other four independent cohorts ( Table 1).
As the training set, we first identified 626 genes as mRNAsi-related genes (|R| > 0.5) based on the mRNAsi and transcriptome data of TCGA-LIHC cohorts. Then, 13 prognostic hub genes were identified from 626 mRNAsi-related genes using the LASSO regression models ( Figure 3A,B). Among these hub genes, only RAMP3 showed a negative correlation with mRNAsi, while the other genes showed a positive correlation with mRNAsi including stem/progenitor cells regulators such as KPNA2 and YBX1 ( Figure 3C,D) [46][47][48][49][50]. Then, multivariate Cox regression analysis was conducted to develop the HSRM based on the expression levels of 13 prognostic hub genes ( Figure 3E). The formula for the HSRM was as follows: (Table S3). The predictive performance of HSRM in the TCGA-LIHC cohort was evaluated by time-dependent ROC curve analysis; the AUC values of 1 year, 2 year, and 3 year OS were 0.786, 0.742, and 0.725, respectively ( Figures 3F and S1A). ver, we also combined HSRM with TNM and the Barcelona Clinic Liver Cancer (BCLC) staging system to improve the discrimination power ( Figures S2 and S3). In our results, only BCLC stage may significantly improve the performance of HSRM. Due to the data limitation, TNM and BCLC stage data were not available in every HCC cohort in this study (Table 1); thus, more HCC cohorts should be used to validate these results in the future study.  Then, we used four independent cohorts (692 patients from different regions) to validate the predictive efficacy of HSRM. In line with the training cohort, HSRM significantly discriminated patients with distinct prognosis ( Figure 3G,H), indicating the applicability of this model for HCC induced by various risk factors. Moreover, we also assessed the efficiency of HSRM in each validation cohort by time-dependent ROC curve analysis; the AUC values of 1 year, 2 year, and 3 year OS in these cohorts demonstrated that HSRM could robustly predict the prognosis of HCC patients ( Figure 3K-N and Figure S1B-E). Moreover, we also combined HSRM with TNM and the Barcelona Clinic Liver Cancer (BCLC) staging system to improve the discrimination power ( Figures S2 and S3). In our results, only BCLC stage may significantly improve the performance of HSRM. Due to the data limitation, TNM and BCLC stage data were not available in every HCC cohort in this study (Table 1); thus, more HCC cohorts should be used to validate these results in the future study.
Moreover, the tumor volume doubling time of HCC patients from the GSE54236 cohort were calculated based on imaging data [36]. As shown in the Figure 4A, a higher score of HSRM may indicate a faster tumor growth rate in HCC patients. GSE104580 is a cohort publicly available with information pertaining to the therapeutic response to TACE. TACE is considered as part of standard treatment options for HCC [4]. In our results, HSRM significantly predict response to TACE in HCC patients with AUC > 0.7 ( Figure 4B,C). These results suggested that in addition to the prognostic relevance, HSRM may also be associated with disease progression and treatment response of HCC patients. Moreover, the tumor volume doubling time of HCC patients from the GSE54236 cohort were calculated based on imaging data [36]. As shown in the Figure 4A, a higher score of HSRM may indicate a faster tumor growth rate in HCC patients. GSE104580 is a cohort publicly available with information pertaining to the therapeutic response to TACE. TACE is considered as part of standard treatment options for HCC [4]. In our results, HSRM significantly predict response to TACE in HCC patients with AUC > 0.7 (Figure 4B,C). These results suggested that in addition to the prognostic relevance, HSRM may also be associated with disease progression and treatment response of HCC patients.

Consensus Clustering Divided HCC Patients into Two Stemness Subtypes with Distinct Functional Annotation and Somatic Mutation Pattern
To explore the underlying mechanisms contributing to the HCC stemness, we conducted unsupervised consensus clustering to stratify 1059 HCC patients of five cohorts based on the expression levels of the mRNAsi-related genes mentioned above. According to the consensus heatmap, the PAC algorithm and the CDF curves (k value = 2), two stemness subtypes were identified ( Figure 5A-C). We found that subtype I (456 patients, 43%) exhibited higher mRNAsi than subtype II (603 patients, 57%) ( Figure 5D,E).
First, GSVA was performed to investigate the functional differences between the two stemness subtypes. As shown in the boxplots, four pathways (i.e., DNA replication, mismatch repair, homologous recombination, and cell cycle) showed a positive correlation with stemness subtype I ( Figure 5F). Conversely, a series of metabolic pathways, such as primary bile acid biosynthesis and fatty acid metabolism, were found to be enriched in the HCC samples of subtype II ( Figure 5F). These results indicated that the various metabolisms of stemness subtype I tumors might be undermined, which suggests potential links with cancer stemness in HCC.

Consensus Clustering Divided HCC Patients into Two Stemness Subtypes with Distinct Functional Annotation and Somatic Mutation Pattern
To explore the underlying mechanisms contributing to the HCC stemness, we conducted unsupervised consensus clustering to stratify 1059 HCC patients of five cohorts based on the expression levels of the mRNAsi-related genes mentioned above. According to the consensus heatmap, the PAC algorithm and the CDF curves (k value = 2), two stemness subtypes were identified ( Figure 5A-C). We found that subtype I (456 patients, 43%) exhibited higher mRNAsi than subtype II (603 patients, 57%) ( Figure 5D,E).
First, GSVA was performed to investigate the functional differences between the two stemness subtypes. As shown in the boxplots, four pathways (i.e., DNA replication, mismatch repair, homologous recombination, and cell cycle) showed a positive correlation with stemness subtype I ( Figure 5F). Conversely, a series of metabolic pathways, such as primary bile acid biosynthesis and fatty acid metabolism, were found to be enriched in the HCC samples of subtype II ( Figure 5F). These results indicated that the various metabolisms of stemness subtype I tumors might be undermined, which suggests potential links with cancer stemness in HCC.

Figure 5. Identification of two HCC stemness subtypes with distinct functional annotations. (A-C)
Consensus clustering was conducted to classify HCC stemness subtypes. Consensus clustering matrix (A), cumulative distribution function (C-F) curves (B) and the relative change in the area under the CDF curve (C) from k = 2 to 6 was used to determine the optimal cluster number as 2.
(D) Heatmap showing the expression levels of mRNAsi-related genes between the two HCC stemness subtypes. The color red represents genes that were positively correlated with mRNAsi, and the color blue represents genes that were negatively correlated with mRNAsi. (E) Violin plots of the mRNAsi of HCC patients classified by HCC stemness subtypes. (F) Boxplots illustrating the KEGG pathways with significantly differential enrichment scores, which were evaluated by GSVA analysis between HCC stemness subtypes (***, p ≤ 0.001).
Subsequently, we downloaded the somatic mutation data of 229 and 353 HCC patients from the LIRI-JP and TCGA-LIHC cohorts, respectively. The somatic mutation pattern of each cohort was visualized with the R package "Maftools" ( Figure 6A). TP53 was found to be the most frequently mutated gene of stemness subtype I in both cohorts, whereas the CTNNB1 was the most frequently mutated gene of stemness subtype II (Figure 6A). When we pooled data of LIRI-JP and TCGA-LIHC cohorts, only three genes were identified as differentially mutated genes by somatic mutation analysis (p-adjust < 0.05, Figure 6B). We found mutation of RB1, a well-known tumor suppressor, was also enriched in tumors of stemness subtype I with a relatively low mutation rate ( Figure 6B). To investigate the association between mRNAsi and somatic mutations of TP53, RB1, and CTNNB1, we then compared mRNAsi between wild-type and mutated HCC samples. We found the mRNAsi of TP53 mutant or RB1 mutant HCC samples were significantly higher than wild-type HCC samples, while there was no significant difference between the CTNNB1 wild-type and the CTNNB1 mutant ( Figure 6C-E). These findings demonstrate that HCC stemness might be linked with TP53 or RB1 tumor mutations. (A-C) Consensus clustering was conducted to classify HCC stemness subtypes. Consensus clustering matrix (A), cumulative distribution function (C-F) curves (B) and the relative change in the area under the CDF curve (C) from k = 2 to 6 was used to determine the optimal cluster number as 2.
(D) Heatmap showing the expression levels of mRNAsi-related genes between the two HCC stemness subtypes. The color red represents genes that were positively correlated with mRNAsi, and the color blue represents genes that were negatively correlated with mRNAsi. (E) Violin plots of the mRNAsi of HCC patients classified by HCC stemness subtypes. (F) Boxplots illustrating the KEGG pathways with significantly differential enrichment scores, which were evaluated by GSVA analysis between HCC stemness subtypes (***, p ≤ 0.001).
Subsequently, we downloaded the somatic mutation data of 229 and 353 HCC patients from the LIRI-JP and TCGA-LIHC cohorts, respectively. The somatic mutation pattern of each cohort was visualized with the R package "Maftools" (Figure 6A). TP53 was found to be the most frequently mutated gene of stemness subtype I in both cohorts, whereas the CTNNB1 was the most frequently mutated gene of stemness subtype II ( Figure 6A). When we pooled data of LIRI-JP and TCGA-LIHC cohorts, only three genes were identified as differentially mutated genes by somatic mutation analysis (p-adjust < 0.05, Figure 6B). We found mutation of RB1, a well-known tumor suppressor, was also enriched in tumors of stemness subtype I with a relatively low mutation rate ( Figure 6B). To investigate the association between mRNAsi and somatic mutations of TP53, RB1, and CTNNB1, we then compared mRNAsi between wild-type and mutated HCC samples. We found the mRNAsi of TP53 mutant or RB1 mutant HCC samples were significantly higher than wild-type HCC samples, while there was no significant difference between the CTNNB1 wild-type and the CTNNB1 mutant ( Figure 6C-E). These findings demonstrate that HCC stemness might be linked with TP53 or RB1 tumor mutations.

Identification of Potential Compounds Targeting Transcriptional Stemness of HCC
In recent years, high-throughput technologies have been applied to investigate the effects of compounds against various cancer types. We first used the OCLR algorithm to calculate mRNAsi for each tumor sample from all 1059 HCC patients from the five cohorts collected in the current study. Then, we identified 209 DEGs from the high mRNAsi and low mRNAsi groups of the HCC samples using the R package "limma". Consistent with the GSVA between two stemness subtypes, pathway enrichment analysis and GSEA demonstrated the activation of cell cycle pathway and DNA replication pathway in high mRNAsi group, while several metabolic pathways were downregulated in this group ( Figure 7A,B). Further, we sought to screen candidate compounds using the 209 DEGs as input in the CMap database, which is a comprehensive data set containing gene expression profiles of multiple cell lines tested by nearly 5000 compounds [51]. Our results identified 81 compounds as potential candidate compounds ( Figure 7C). Among these, topoisomerase inhibitors, CDK inhibitors, and HDAC inhibitors were top hits of MOAs, indicating that drugs targeting these molecules may inhibit transcriptional cancer stemness of HCC ( Figure 7C).

Identification of Potential Compounds Targeting Transcriptional Stemness of HCC
In recent years, high-throughput technologies have been applied to investigate the effects of compounds against various cancer types. We first used the OCLR algorithm to calculate mRNAsi for each tumor sample from all 1059 HCC patients from the five cohorts collected in the current study. Then, we identified 209 DEGs from the high mRNAsi and low mRNAsi groups of the HCC samples using the R package "limma". Consistent with the GSVA between two stemness subtypes, pathway enrichment analysis and GSEA demonstrated the activation of cell cycle pathway and DNA replication pathway in high mRNAsi group, while several metabolic pathways were downregulated in this group ( Figure 7A,B). Further, we sought to screen candidate compounds using the 209 DEGs as input in the CMap database, which is a comprehensive data set containing gene expression profiles of multiple cell lines tested by nearly 5000 compounds [51]. Our results identified 81 compounds as potential candidate compounds ( Figure 7C). Among these, topoisomerase inhibitors, CDK inhibitors, and HDAC inhibitors were top hits of MOAs, indicating that drugs targeting these molecules may inhibit transcriptional cancer stemness of HCC ( Figure 7C). We searched experimental and clinical evidence of these compounds in PubMed (https://www.ncbi.nlm.nih.gov/pubmed/, accessed on 4 July 2021) and found all these predicted topoisomerase inhibitors have been researched in HCC treatment (Table S4). Among the unstudied CDK and HDAC inhibitors, Aminopurvalanol-a and NCH-51 had the most significant cMAP scores (Table S4). We thus selected these two compounds to perform in vitro experiments using two commonly used liver cancer cell lines: Huh7 and Hep3B. The results of the CCK8 assay demonstrated that both Aminopurvalanol-a and NCH-51 effectively suppressed proliferation and viability of Huh7 and Hep3B cells ( Figure 8A-D). In addition, sphere-forming ability was also inhibited by Aminopurvalanola and NCH-51 in both Huh7 and Hep3B cells ( Figure 8E,F).

Figure 7.
Screening of potential compounds targeting HCC cancer stemness using the cMAP data set. Pathway enrichment analysis (A) of DEGs between the high and low mRNAsi groups of 1059 HCC patients. (B) GSEA between the high and low mRNAsi groups of 1059 HCC patients. (C) Heatmap exhibiting the potential compounds (columns) that share mechanisms of action (MOAs) (rows). All potential compounds had an enrichment score ≤ −95.
We searched experimental and clinical evidence of these compounds in PubMed (https://www.ncbi.nlm.nih.gov/pubmed/, accessed on 4 July 2021) and found all these predicted topoisomerase inhibitors have been researched in HCC treatment (Table S4). Among the unstudied CDK and HDAC inhibitors, Aminopurvalanol-a and NCH-51 had the most significant cMAP scores (Table S4). We thus selected these two compounds to perform in vitro experiments using two commonly used liver cancer cell lines: Huh7 and Hep3B. The results of the CCK8 assay demonstrated that both Aminopurvalanol-a and NCH-51 effectively suppressed proliferation and viability of Huh7 and Hep3B cells (Figure 8A-D). In addition, sphere-forming ability was also inhibited by Aminopurvalanol-a and NCH-51 in both Huh7 and Hep3B cells ( Figure 8E,F).
In addition, 13 compounds targeting topoisomerase, CDK, and HDAC were selected to assess their response in 81 liver cancer cell line models in LIMORE, a pharmacogenomics database of liver cancer [28]. As shown in Figure 8G, higher mRNAsi of liver cancer cell lines were associated with a stronger response to these compounds. Further studies are required to verify the potency of these compounds with selective target pathways.  In addition, 13 compounds targeting topoisomerase, CDK, and HDAC were selected to assess their response in 81 liver cancer cell line models in LIMORE, a pharmacogenomics database of liver cancer [28]. As shown in Figure 8G, higher mRNAsi of liver cancer cell lines were associated with a stronger response to these compounds. Further studies are required to verify the potency of these compounds with selective target pathways.

Discussion
By acquiring stem-cell-like characteristics, the oncogenic dedifferentiation may drive tumorigenesis, metastasis, tumor recurrence and resistance to cancer treatment [12][13][14]. In the current study, transcriptome data of HCC cohorts were used to calculate mRNAsi. Among patients of TCGA-LIHC cohort, higher mRNAsi showed a correlation with higher histological grade, advanced TNM stage, and higher AFP levels. High mRNAsi was also associated with poor prognosis of HCC patients and multivariate Cox regression analysis demonstrated that mRNAsi was an independent prognostic factor for HCC OS. We identified 626 mRNAsi-related genes by assessing the correlation between gene expression levels and mRNAsi scores. Subsequently, we constructed the HSRM with TCGA-LIHC cohort as the training data set and validated it in the other four HCC cohorts. The risk score of HSRM was a robust predictor of OS of HCC patients. In addition, HSRM was also significantly associated with TACE treatment response and rapid tumor growth. Based on the gene expression levels of mRNAsi-related genes, we classified 1059 patients into two stemness subtypes with differential mRNAsi levels. The two stemness subtypes exhibited distinct molecular pathway enrichment and somatic mutation patterns. Moreover, potential compounds targeting cancer stemness of HCC were identified by interrogating the CMap database. Finally, CDK inhibitor (CDKi) Aminopurvalanol-a and HDAC inhibitor (HDACi) NCH-51 were selected to conduct in vitro experiments to further test their potential role in HCC treatment.
In the current study, the OCLR machine learning algorithm was used to assess stemness of HCC tissues. Based on DEGs between CSCs and non-CSCs, several prognostically relevant cancer stemness scores have been developed in the context of various tumors [52][53][54][55][56]. Compared with these studies, the OCLR machine learning algorithm was derived from a more comprehensive gene set and tissue origins [23]. Due to the fact of its flexibility and versatility, the OCLR machine learning algorithm has been used to investigate cancer stemness and relevant gene signatures in various tumor types including prostate cancer glioblastoma, breast cancer, and esophageal cancer [25,26,57,58].
Over the last decade, several prognostic prediction models for HCC patients have been developed based on transcriptomic data [59][60][61]. In the current study, we found mRNAsi was an independent prognostic factor for HCC OS in the TCGA-LIHC cohort. The HSRM developed from the TCGA-LIHC cohort could robustly predict OS in four independent HCC cohorts, indicating the applicability of this model to HCC patients of different races and etiologies. Moreover, previous studies indicated that the CSCs subset and stemness-associated genes may confer resistance to TACE treatment [62,63]. In our study, we observed an association of HSRM with response to TACE treatment. These results prompted us to explore the potential compounds that could target cancer stemness of HCC.
A recent study demonstrated that suppression of a series of liver-specific metabolic modules in HCC and the deficiency of the urea cycle may promote expansion of cancer stem cells [64]. The transcriptomic profile of the two subtypes was analyzed by GSVA, and we found enrichment of a series of metabolic pathways in subtype II, which possessed significant lower mRNAsi levels than subtype I. These results indicated that metabolic pathways, such as primary bile acid biosynthesis and fatty acid metabolism, were intertwined with HCC cancer stemness. TP53, a well-known tumor suppressor, is one of the most frequently mutated genes in HCC patients; in addition, TP53 mutation is associated with the poor prognosis of HCC patients [32]. In a previous study, human pluripotent stem cells were found to recurrently acquire and expand negative TP53 mutations, which conferred selective advantage during cell culture [65]. In our study, somatic mutation analysis in both the TCGA-LIHC and LIRI-JP cohorts demonstrated significant enrichment of the TP53 mutation in stemness subtype I. The mRNAsi levels of HCC patients with mutant TP53 were also significantly higher than the mRNAsi levels of other HCC patients.
In this study, we also explored the infiltration levels of immune cells in HCC samples using different algorithms. However, there was no solid evidence of any strong correlation between HCC cancer stemness and immune infiltration. Although we observed significant association between stemness subtypes and infiltration level of several immune cells, the infiltration difference was relatively weak ( Figure S4). These results were consistent with a previous study that found a weak correlation of cancer stemness with anticancer immunity and mutation load in HCC [66].
Previous studies showed stemness of HCC contributing to sorafenib and TACE treatment resistance [67,68], and controlling stemness may help to increase their treatment response. As mRNAsi is a stemness index derived from transcriptional features, we identified potential compounds and mechanisms that may reduce mRNAsi transcriptionally by interrogating the cMAP database. Topoisomerase, CDK, and HDAC inhibitors ranked as the top three mechanisms in the MOA analysis, in which 34 compounds were identified. We also found a strong association between mRNAsi of HCC cell lines and the response of compounds targeting topoisomerase, CDK, and HDAC in the LIMORE data set. These molecules were closely associated with cell cycle and DNA replication, which constitutively contribute to stem cell self-renewal [69][70][71].
Sorafenib remained as standard of care for advanced HCC for a decade since its approval in 2007. After, the combination of atezolizumab and bevacizumab was the first treatment option that prolonged OS of advanced HCC. During this period, novel CDKi and HDACi have been continuously developed for treatment of HCC [72][73][74], and a series of clinical trials were undertaken to overcome sorafenib resistance [75][76][77]. Similarly, clinical trials (i.e., NCT03024437 and NCT03280563) are being conducted to explore whether CDKi and HDACi have a synergistic antitumor effect in combination with atezolizumab and bevacizumab in other tumor types. These studies suggest the potential for combinations of HCC first-line therapies with novel CDKi or HDACi, which may provide clinical benefits for more patients. At present, there is no evidence of the efficacy of Aminopurvalanol-a (CDKi) and NCH51 (HDACi) in the HCC treatment. To widen the spectrum of candidate compounds for HCC treatment, Aminopurvalanol-a and NCH-51 were selected to conduct in vitro experiments.
However, further studies are required to overcome the limitations of current research. First, dedifferentiated cells may arise from bona fide stem and progenitor cells or by dedifferentiation from non-stem cancer cells. It is not clear whether the transcriptional stemness index derived from bulk tumor samples is representative of these rare cell populations [23,66,78]. Second, more samples from larger multi-center HCC cohorts should be used to validate the predictive efficacy of the HSRM, especially for treatment responses. Third, further studies are required to explore the experimental and clinical value of Aminopurvalanol-a and NCH51 in HCC treatment.

Conclusions
We systemically evaluated the stemness features in five independent cohorts of HCC patients and deciphered the associations between molecular pathways, somatic variations, and HCC stemness subtypes. The HSRM showed stable performance in discriminating the OS of HCC patients. We screened potential compounds to target HCC cancer stemness and further studies are warranted to explore their therapeutic value.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers14030563/s1, Figure S1: Time-dependent receiver operating characteristic (ROC) curves demonstrating the clinical significance of HSRM in predicting the OS in five HCC cohorts; Figure S2: Time-dependent receiver operating characteristic (ROC) curves to compare the prognostic accuracy of HSRM with TNM stage in predicting the OS in four HCC cohorts. TNM stages were transformed into numeric codes before they were entered into the COX regression analysis. Numeric codes are as follows: stage I = 1, stage II = 2, stage III = 3, and stage IV = 4; Figure S3: Time-dependent receiver operating characteristic (ROC) curves to compare the prognostic accuracy of HSRM with the Barcelona Clinic Liver Cancer (BCLC) staging system in predicting the OS in two HCC cohorts. BCLC stages were transformed into numeric codes before they were entered into the COX regression analysis. Numeric codes are as follows: stage 0 = 0, stage 1 = 1, stage 2 = 2, and stage 3 = 3; Figure S4: Deconvolution algorithms Cibersort and ssGSEA of 27 knowledge-based functional gene expression signatures (Fges) were used to compare immune infiltration between stemness subtype I and II of HCC patients in five cohorts; Table S1: Summary of the main findings in current research; Table S2: Stemness index (mRNAsi) of 1059 HCC patients from five cohorts; Table S3: Thirteen HSRM hub genes selected by LASSO regression models; Table S4: Topoisomerase inhibitors, CDK inhibitors, and HDAC inhibitors predicted to transcriptionally inhibit cancer stemness of HCC.   [33]. Data from the GSE14520 (GPL3921), GSE54236, and GSE104580 cohorts are available in the GEO repository (https://www.ncbi.nlm.nih.gov/geo/, accessed on 20 May 2021). The transcriptomic data of 229 human cell samples are available in the PCBC repository (https: //www.synapse.org/, accessed on 26 May 2021). Drug responses data and transcriptomes of 81 liver cancer cell line models are available from the LIMORE website (https://www.picb.ac.cn/limore/, accessed on 26 June 2021).